Goto

Collaborating Authors

 moral graph


Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Neural Information Processing Systems

Parviainen et al. (2014) adopted an anytime integer linear programming (ILP) Otherwise it returns a sub-optimal DAG with bounded treewidth. Nie et al. (2014) proposed an efficient anytime ILP approach with a polynomial number of constraints Nie et al. (2015) proposed the method S2.


Advances in Learning Bayesian Networks of Bounded Treewidth

Neural Information Processing Systems

This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.


Revisiting Differentiable Structure Learning: Inconsistency of $\ell_1$ Penalty and Beyond

arXiv.org Machine Learning

Recent advances in differentiable structure learning have framed the combinatorial problem of learning directed acyclic graphs as a continuous optimization problem. Various aspects, including data standardization, have been studied to identify factors that influence the empirical performance of these methods. In this work, we investigate critical limitations in differentiable structure learning methods, focusing on settings where the true structure can be identified up to Markov equivalence classes, particularly in the linear Gaussian case. While Ng et al. (2024) highlighted potential non-convexity issues in this setting, we demonstrate and explain why the use of $\ell_1$-penalized likelihood in such cases is fundamentally inconsistent, even if the global optimum of the optimization problem can be found. To resolve this limitation, we develop a hybrid differentiable structure learning method based on $\ell_0$-penalized likelihood with hard acyclicity constraint, where the $\ell_0$ penalty can be approximated by different techniques including Gumbel-Softmax. Specifically, we first estimate the underlying moral graph, and use it to restrict the search space of the optimization problem, which helps alleviate the non-convexity issue. Experimental results show that the proposed method enhances empirical performance both before and after data standardization, providing a more reliable path for future advancements in differentiable structure learning, especially for learning Markov equivalence classes.


Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

arXiv.org Machine Learning

To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.


What are human values, and how do we align AI to them?

arXiv.org Artificial Intelligence

There is an emerging consensus that we need to align AI systems with human values (Gabriel, 2020; Ji et al., 2024), but it remains unclear how to apply this to language models in practice. We split the problem of "aligning to human values" into three parts: first, eliciting values from people; second, reconciling those values into an alignment target for training ML models; and third, actually training the model. In this paper, we focus on the first two parts, and ask the question: what are "good" ways to synthesize diverse human inputs about values into a target for aligning language models? To answer this question, we first define a set of 6 criteria that we believe must be satisfied for an alignment target to shape model behavior in accordance with human values. We then propose a process for eliciting and reconciling values called Moral Graph Elicitation (MGE), which uses a large language model to interview participants about their values in particular contexts; our approach is inspired by the philosophy of values advanced by Taylor (1977), Chang (2004), and others. We trial MGE with a representative sample of 500 Americans, on 3 intentionally divisive prompts (e.g. advice about abortion). Our results demonstrate that MGE is promising for improving model alignment across all 6 criteria. For example, almost all participants (89.1%) felt well represented by the process, and (89%) thought the final moral graph was fair, even if their value wasn't voted as the wisest. Our process often results in "expert" values (e.g. values from women who have solicited abortion advice) rising to the top of the moral graph, without defining who is considered an expert in advance.


Advances in Learning Bayesian Networks of Bounded Treewidth Denis D. Mauá Rensselaer Polytechnic Institute University of São Paulo Troy, NY, USA

Neural Information Processing Systems

This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.


Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Neural Information Processing Systems

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and to large treewidths. Our novel approach consistently outperforms the state of the art on experiments with up to thousands of variables.


An algorithm for reconstruction of triangle-free linear dynamic networks with verification of correctness

arXiv.org Machine Learning

Reconstructing a network of dynamic systems from observational data is an active area of research. Many approaches guarantee a consistent reconstruction under the relatively strong assumption that the network dynamics is governed by strictly causal transfer functions. However, in many practical scenarios, strictly causal models are not adequate to describe the system and it is necessary to consider models with dynamics that include direct feedthrough terms. In presence of direct feedthroughs, guaranteeing a consistent reconstruction is a more challenging task. Indeed, under no additional assumptions on the network, we prove that, even in the limit of infinite data, any reconstruction method is susceptible to inferring edges that do not exist in the true network (false positives) or not detecting edges that are present in the network (false negative). However, for a class of triangle-free networks introduced in this article, some consistency guarantees can be provided. We present a method that either exactly recovers the topology of a triangle-free network certifying its correctness or outputs a graph that is sparser than the topology of the actual network, specifying that such a graph has no false positives, but there are false negatives.


Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

arXiv.org Machine Learning

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programming model which can naturally incorporate a super-structure in order to reduce the set of possible candidate DAGs. We use the penalized negative log-likelihood score function with both $\ell_0$ and $\ell_1$ regularizations and propose a new mixed-integer quadratic optimization (MIQO) model, referred to as a layered network (LN) formulation. The LN formulation is a compact model, which enjoys as tight an optimal continuous relaxation value as the stronger but larger formulations under a mild condition. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only $\ell_1$ regularization. In particular, the LN formulation clearly outperforms existing methods in terms of computational time needed to find an optimal DAG in the presence of a sparse super-structure.


The Complexity of Morality: Checking Markov Blanket Consistency with DAGs via Morality

arXiv.org Machine Learning

A family of Markov blankets in a faithful Bayesian network satisfies the symmetry and consistency properties. In this paper, we draw a bijection between families of consistent Markov blankets and moral graphs. We define the new concepts of weak recursive simpliciality and perfect elimination kits. We prove that they are equivalent to graph morality. In addition, we prove that morality can be decided in polynomial time for graphs with maximum degree less than $5$, but the problem is NP-complete for graphs with higher maximum degrees.